Chapter 23

Survival Regression

IN THIS CHAPTER

Knowing when to use survival regression

Grasping the concepts behind survival regression

Running and interpreting the outcome of survival regression

Peeking at prognosis curves

Estimating sample size for survival regression

Survival regression is one of the most commonly used techniques in biostatistics. It overcomes the

limitations of the log-rank test (see Chapter 22) and allows you to analyze how survival time is

influenced by one or more predictors (the X variables), which can be categorical or numerical. In this

chapter, we introduce survival regression. We specify when to use it, describe its basic concepts, and

show you how to run survival regressions in statistical software and interpret the output. We also

explain how to build prognosis curves and estimate the sample size you need to support a survival

regression.

Note: Because time-to-event data so often describe actual survival, when the event we are talking

about is death, we use the terms death and survival time. But everything we say about death applies to

the first occurrence of any event, like pre-diabetes patients restoring their blood sugar to normal

levels, or cancer survivors suffering a recurrence of cancer.

Knowing When to Use Survival Regression

In Chapter 21, we examine the special problems that come up when the researcher can’t continue to

collect data during follow-up on a participant long enough to observe whether or not they ever

experience the event being studied. To recap, in this situation, you should censor the data. This means

you should acknowledge the participant was only observed for a limited amount of time, and then was

lost to follow-up. In that chapter, we also explain how to summarize survival data using life tables and

the Kaplan-Meier method, and how to graph time-to-event data as survival curves. In Chapter 22, we

describe the log-rank test, which you can use to compare survival among a small number of groups —

for example, participants taking drug versus placebo, or participants initially diagnosed at four

different stages of the same cancer.

But the log-rank test has limitations:

The log-rank test doesn’t handle numerical predictors well. Because this test compares survival

among a small number of categories, it does not work well for a numerical variable like age. To

compare survival among different age groups with the log-rank test, you would first have to

categorize the participants into age ranges. The age ranges you choose for your groups should be